图神经网络(GNN)广泛用于图表学习。尽管普遍存在,但GNN在图形分类任务中遭受了两个缺点,忽视了图级关系和概括问题。每个图在GNN消息传递/图池中分别处理,并在每个单独的图表上操作过度拟合的现有方法。这使得图表在下游分类中学到的有效性降低了。在本文中,我们为图形分类任务提出了一个班级感知表示的改进(CARE)框架。 CARE计算简单但功能强大的类表示,并注入它们,以将图表的学习转向更好的类别可分离性。 Care是一个高度灵活的插件框架,能够在不显着增加计算成本的情况下合并任意GNN骨架。从理论上讲,我们还证明,通过VAPNIK-CHERVONENKIS(VC)维度分析,CARE具有比其GNN主链更好的概括上限。我们在9个基准数据集上使用10个著名的GNN骨架进行的广泛实验验证了护理的优势和有效性,而不是其GNN对应物。
translated by 谷歌翻译
With the recent advances in video and 3D understanding, novel 4D spatio-temporal challenges fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object with respect to the camera pose of a query frame. Current methods tackle the problem of VQ3D by lifting the 2D localization results of the sister task Visual Queries with 2D Localization (VQ2D) into a 3D reconstruction. Yet, we point out that the low number of Queries with Poses (QwP) from previous VQ3D methods severally hinders their overall success rate and highlights the need for further effort in 3D modeling to tackle the VQ3D task. In this work, we formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. We estimate more robust camera poses, leading to more successful object queries and substantially improved VQ3D performance. In practice, our method reaches a top-1 overall success rate of 86.36% on the Ego4D Episodic Memory Benchmark VQ3D, a 10x improvement over the previous state-of-the-art. In addition, we provide a complete empirical study highlighting the remaining challenges in VQ3D.
translated by 谷歌翻译
PointNet ++是Point Cloud理解的最具影响力的神经体系结构之一。尽管PointNet ++的准确性在很大程度上已经超过了诸如PointMLP和Point Transformer之类的最近网络,但我们发现,大部分性能增益是由于改进的培训策略,即数据增强和优化技术,而不是架构大小,而不是架构的大小,而不是架构。创新。因此,PointNet ++的全部潜力尚未探索。在这项工作中,我们通过对模型培训和缩放策略进行系统的研究来重新审视经典的PointNet ++,并提供两个主要贡献。首先,我们提出了一组改进的培训策略,可显着提高PointNet ++的性能。例如,我们表明,如果没有任何架构的任何变化,则可以将ScanObjectnn对象分类的PointNet ++的总体准确性(OA)从77.9 \%\%提高到86.1 \%,即使超过了最先进的端点”。其次,我们将倒置的残留瓶颈设计和可分离的MLP引入到PointNet ++中,以实现高效且有效的模型缩放,并提出了PointNext,即PointNets的下一个版本。可以在3D分类和分割任务上灵活地扩展PointNext,并优于最先进的方法。对于分类,PointNext的总体准确度为ScanObjectnn $ 87.7 \%$,超过了PointMLP $ 2.3 \%$,而推断的$ 10 \ times $ $。对于语义细分,PointNext建立了新的最先进的性能,$ 74.9 \%$ MEAN IOU在S3DIS上(6倍交叉验证),优于最近的Point Transformer。代码和型号可在https://github.com/guochengqian/pointNext上获得。
translated by 谷歌翻译
尽管神经网络在各种应用程序中取得了非常成功的成功,但在资源受限的硬件中实施它们仍然是一项激烈研究的领域。通过用量化的(例如4位或二进制)对应物代替神经网络的权重,可以实现大量的计算成本,记忆和功耗。为此,我们概括了一种基于贪婪的路径跟踪机制的训练后神经网络量化方法GPFQ。除其他外,我们提出了修改以促进权重的稀疏性,并严格分析相关的错误。此外,我们的错误分析扩展了GPFQ上先前工作的结果以处理一般量化字母,表明对于量化单层网络,相对方误差基本上是在权重的数量上线性衰减的,即过度参数水平。我们的结果始于一系列输入分布以及完全连接和卷积架构,从而扩大了先前的结果。为了通过经验评估该方法,我们对几个平均重量很少的几个常见体系结构进行量化,并在Imagenet上测试它们,与非量化模型相比仅显示准确性较小。我们还证明了标准修改,例如偏置校正和混合精度量化,进一步提高了准确性。
translated by 谷歌翻译
许多支付平台持有大规模的营销活动,为鼓励用户通过他们的申请进行奖励。为了最大限度地提高投资回报,在两阶段程序中通常会解决激励拨款。在训练响应估计模型以估计用户的移动支付概率(MPP)之后,应用线性编程过程来获得最佳激励分配。然而,由先前偏置分配策略生成的训练集中的大量偏置数据导致偏置估计。此偏差劣化响应模型的性能并误导线性编程过程,显着降低了所产生的分配策略的性能。为了克服这种障碍,我们提出了偏置校正对抗性网络。我们的方法利用了在全随机分配策略下获得的一小集非偏见数据来培训一个无偏的模型,然后使用它来减少对抗性学习的偏差。离线和在线实验结果表明,我们的方法优于最先进的方法,并显着提高了现实世界营销活动中所产生的分配政策的绩效。
translated by 谷歌翻译
Text-to-sql任务,旨在将问题的自然语言转化为SQL查询,最近引起了很多关注。 Text-to-SQL最具挑战性的问题之一是如何将培训的模型概括为未遵守的数据库模式,也称为跨域文本到SQL任务。关键在于(i)编码方法的概括性,以模拟问题和数据库模式和(ii)问题模式链接方法,以了解数据库模式中问题和表/列之间的单词之间的映射。专注于上述两个关键问题,我们提出了一个用于跨域文本到SQL的结构感知双图形聚合网络(Sadga)。在Sadga中,我们采用图形结构为自然语言问题和数据库模式提供统一的编码模型。基于所提出的统一建模,我们进一步设计了一个结构感知聚合方法,以了解问题图和架构图之间的映射。结构感知聚合方法具有全局图链接,本地图链接和双图聚合机制。我们不仅研究了我们的提案的表现,而且还在撰写本文时挑战挑战文本到SQL基准蜘蛛的第3位。
translated by 谷歌翻译
尽管在一般强化学习(RL)中建立了良好的建立,但很少在受约束的RL(CRL)中探索基于价值的方法,因为它们无法找到可以在多个动作中随机进行随机的策略的能力。为了将基于价值的方法应用于CRL,最新的游戏理论方法采用了混合策略,该策略将一组精心生成的策略之间随机进行随机,以收敛到所需的约束可满足的策略。但是,这些方法需要存储大量的政策,这不是政策效率的,并且可能会在约束深度RL中产生过高的记忆成本。为了解决这个问题,我们提出了一种替代方法。我们的方法首先将CRL重新制定为等效距离优化问题。使用专门设计的线性优化Oracle,我们得出了一个元叠层,该元值使用任何现成的RL算法和任何条件梯度(CG)型算法作为子例程来求解它。然后,我们提出了CG型算法的新变体,该变体概括了最小范数(MNP)方法。所提出的方法与现有游戏理论方法的收敛速率相匹配,并实现了最差的最佳政策效率。导航任务上的实验表明,我们的方法将记忆成本降低了一个数量级,同时达到了更好的性能,并证明了其有效性和效率。
translated by 谷歌翻译
我们开发了一个新颖的框架,将稀疏集团拉索的正规化者添加到深度学习中的自适应优化者家族中,例如动量,亚当,亚当,阿姆斯格拉德,阿德哈西亚人,并创建了新的优化者,这些优化者被称为群体动量,命名因此,Adagrad小组,亚当集团,Amsgrad集团和Adahessian集团等。我们基于原始偶的方法在随机凸设置中建立理论上证明的收敛保证。我们评估了新优化器对具有最先进的深度学习模型的三个大型现实广告单击数据集的正则效应。实验结果表明,与使用幅度修剪方法的后处理过程相比,模型的性能可以在相同的稀疏度水平上显着提高。此外,与没有幅度修剪的情况相比,我们的方法可以实现极高的稀疏性,并具有明显的更好或高度竞争性的性能。
translated by 谷歌翻译
A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.
translated by 谷歌翻译
Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA
translated by 谷歌翻译